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We propose a parallel quantum computing mode for ensemble quantum computer. In this mode, 
some qubits can be in pure states while other qubits in mixed states. It enables a single en- 
semble quantum computer to perform "single-instruction-multi-data" type of parallel computation. 
In Grover's algorithm and Shor's algorithm, parallel quantum computing can provide additional 
speedup. In addition, it also makes a fuller use of qubit resources in an ensemble quantum com- 
puter. As a result, some qubits discarded in the preparation of an effective pure state in the 
Schulman-Varizani, and the Cleve-DiVincenzo algorithms can be re-utilized. 



Quantum computer realization schemes can be classified into single-quantum-computer type where only a single 
quantum system is used, e.g. the trap-ion [1], and ensemble quantum computer(EQC) type such as the liquid NMR 
scheme [2,3] and the solid state scheme [4] where many copies of quantum systems are used. Quantum computer uses 
superposition of states and possesses quantum parallelism which provides enormous computing power. It achieves 
exponential speedup over existing classical computing algorithms in prime-factorization [5] and simulating quantum 
systems [6]. However for some problems the speedup is not exponential. For instance, Grover's algorithm [7], shown 
optimal [8], achieves square- root speedup for unsorted database search. In some other problems, quantum computer 
can not achieve any speedup [9] . It is natural to explore additional speedup by making quantum computers working 
in parallel, as in classical computation. By running many identical quantum computers in parallel, unsorted database 
search can be speeded up greatly [10,11]. Using Liouville space computation [12], exponentially fast search can 
be achieved [13,14]. The speedup is achieved by using more resources. EQC is a potential place to exploit this 
parallelism because there are many molecules in it. Each molecule is potentially a single quantum computer, and 
an EQC is potentially a collection of that number of quantum computers. At present, an EQC is used as a single- 
quantum-computer using effective pure state technique [2,3], apart from the lack of projective measurement. Though 
preparing effective pure state is tedious, Cleve and DiVincenzo [17], Schulman and Varizani [18] have proposed efficient 
algorithms to produce a portion of qubits in a pure state and discard some qubits in the completely mixed states. 

In this Letter, we introduce the idea of parallel quantum computing(PQC) in a single EQC. In the PQC a subset of 
qubits is prepared in pure state while the other qubits in mixed state. In one hand, this enables the "single-instruction- 
multi-data" type of parallel computation in a single EQC for additional speedup, example for the Grover and the 
Shor algorithms. In the other hand, the PQC uses qubits in mixed state and makes a fuller use of the qubit resources. 
For instance those qubits discarded in the Cleve-DiVincenzo [17] and the Schulman-Varizani [18] algorithms can now 
be re-used. The PQC is the classical parallel operation of many single-quantum-computers. 

We introduce notations first. We call a term in a superposed state as a component, for instance [-^q) in a\^pQ) +b\ipi); 
a term in a density matrix a constituent, for instance |V'o)(''/'o| in Po\'4'o) {ipol +P2|V'2)(V'2|- We can divide 

an n number qubits system into two parts, one with ni qubits and the other with n2 qubits, and ni + n2 = n. 
The state of this n qubits system may be represented by |ji,j2) where |ji) is the first ni qubits state and \j2) 
is the latter n2 qubits state. We can also combine the two parts to represent the state as |ji2) = |jij2)- We use 
interchangeably binary and decimal representations. For instance a 4 qubits state with ni = n2 = 2 can be represented 
as [01, 10) — [0110) ~ |1,2) = |6), where the first and third are in the separated binary and decimal forms, whereas 
the second and fourth are in the combined binary and decimal forms respectively. 

We then describe the ensemble measurement which is a generalization of that used in Liouville space computation 
[12,15]. Assume that an EQC can detect the transition signal from a single molecule. For a molecule with n + 1 
qubits, one qubit is used as the ancilla qubit and is labelled 0. The Hamiltonian of the ancilla qubit is 



where Jofe is the J-coupling constant between the ancilla and the fc-th qubit. Ij^ is the z-component of the spin 
operator for the j-th qubit. The transition frequency of the ancilla qubit depends on the state of the remaining n 
qubits. If the ancilla qubit transition occurs with the n qubits in state \iii2---'in), its transition frequency is then 




(1) 



fe>0 



1 



ojo + T^Joki^^Y'' ■ This transition produces a peak in the anciUa qubit spectrum. For instance, the n qubits 

state \iii2---in) = |00...0) corresponds to the highest frequency coq + X)fc=i '"'./ofe) and the state \iii2---in) = 
corresponds to the lowest frequency uq — ^rJofe- Thus one can tell the state of the n qubits \iii2---in) by looking 
at this sign of the multiplet component. Moreover, the ancilla qubit state itself is represented by the spectral peak 
direction. If the ancilla qubit is in the |0)(|1)) state before transition, then the spectral peak is upward(downward). 
The state in the PQC can be a superposition of basis states, say X^^^^ Cji_i2lii)i2)- In this state the first ni qubits 
are in the \ and the latter n2 qubits are in superposed state of the n2-registcr. When we measure the ancilla qubit, 
we will observe only one transition. The transition frequency is random in one of the frequencies corresponding to 
the 712 qubits states in states |0),- • •, |A^2 — 1), because the n qubits state will collapse into one of N2 basis states 
.72) = l.ii.72) randomly with probability {cj-^j^ P- When the superposed state is transformed into a single basis state, 
the transition frequency will be definite and determined by equation (1). This ancilla qubit spectrum method will 
serve as the ensemble measurement throughout this paper. It can tell the ancilla qubit state by the peak direction, 
and the n qubits state by the transition frequency. 

Our quantum computer model is an EQC with A^i = 2"^ molecules. Each molecule can be operated and measured. 
It has n + m + 1 qubits. They are divided into 3 parts: 1 ancilla qubit, a function register with m qubits, and an 
argument register with n qubits. The argument register is further divided into two parts: one part with rii qubits 
called ni-register and another part with 712 qubits called n2-register, and n = n\+n2- In general before a computation, 
the function register and ancilla qubit are prepared in the pure state |0). The argument register is in a mixed state 
with A^i constituent. Each constituent is characterized by the state of the ni-registcr. The n2-rcgister in a given 
constituent is in a superposed state of its N2 = 2"=^ basis states. The density operator of the ensemble is 
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where in \i,ji,j2), i, ji and j2 are the states for the function, the ni- and the n2- registers respectively and 
X]^^=o^ kji j2 P — 1- The ancilla qubit state is not written out explicitly. In this EQC, there are A^i constituents and 
A'^i molecules. Each molecule is in a different state, ^ 



iV2-l 



J2 



=0 



.|0, ,71,^2), which is a superposition of N2 number 
of computational basis states. In general, a quantum computation performs unitary transformations on both the 
argument and the function registers. Denoting this transformation as Uc, the quantum computation on state (2) will 
be 
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An ensemble measurement is then performed to read out the result. 

The quantum computation represented in Eq. (3) on the ensemble (2) is defined as the parallel quantum computing. 
In fact it is iVi quantum computers working in parallel. The computation instruction Uc is the same for all molecules, 
but the databases, numbers represented by different molecules, are different. Hence, the PQC is the single-instruction- 
multi-data type of parallel computation in classical computation. The state (2) is the most general initial state, 
and in most applications, the following simplified state is sufficient: the ni-register in the complete mixed state 
X]^^=o^(I/-^i)bi)(ii| and the n2-register in the equally weighted superposed state X)^^^ \/l/-^2|i2)- In this case, 
^31,32 — l/\/-^2 for all possible ji and j2- 

Now we apply the PQC to the Grover algorithm. Suppose the marked state is |jii2)- Only one qubit is required 
for the function register in this algorithm. This qubit is also used as the ancilla qubit for the ensemble measurement. 
Preparing the function register in the [0) state, the n2-register in the equally weighted superposed state, and the 
ni-register in the complete mixed state, we have then 
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In this way, we divide the database into Ni sub-databases, each with N2 items. Apply a zero-failure rate Grover 
algorithm [16] to the ensemble with J iterations, where J — 1 is the integer part of — /3)/(2/3) and is approximately 
7Ty/N2/4: and /3 = arcsin;^^7=. In this modified Grover algorithm, each iteration consists of four steps :1) apply the 
query to the whole n qubits argument register, on condition that the query is satisfied, rotates the phase of the marked 
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state through angle (j) = 2 arcsin ^•\/]V2 sin jjj^^ {<i> is slightly smaller than tt); 2) make a Hadmard transformation 

on the r7,2-rcgistcr: 3) make a phase rotation through angle o on the |0...0) basis state of the r7,2-register: 4) make a 
Hadmard transformation on the n2-register again. If a sub-database does not eontain the marked state, the above 
operation does not produce any observable effect. The constituent that contains the marked item has its ni-register in 
state The modified Grover algorithm transforms its n2-i'egister from the equally weighted superposed state into a 
single state jj/j) so that the constituent is in the marked state Ij^j^)- At the end of the modified Grover algorithm, one 
makes a further query and on condition that the query is satisfied, makes a flip on the function register. The density 

matrix becomes = (l/iVi)|0)(0| E,,^,o [E.'ILV V^|jiJ2)] [E.^Jo' yW(jiJ2|] + (1M)|1)(1| |j?J2°)(i?i2°l- 
Finally, measuring the ancilla qubit, one obtains A''! transition peaks in the spectrum, each from a constituent. For 
those constituents without the marked item, each peak is upward and its transition frequency is random in one of 
those corresponding to states |jiO),- • •, \i1N2 — 1). The constituent with the marked item is in a unique state and 
produces a downward peak with definite frequency corresponding to the state |ji J^)- I* finds the marked state with 
certainty. 

The number of queries is about -K^/N^jA: = iry^N/Ni/A. This is only l/\/Nx of that a standard Grover algorithm 
requires. This is so because there are Ni single-quantum-computers searching in parallel, each in a reduced database 
with only N/Ni = N2 items. It requires ny^N/Ni/A steps for each single- quantum-computer to complete the search. 
In one extreme ni = 0, there is only a single molecule, the number of query is tt^/N /4, which is just that for the 
standard Grover algorithm. On the other extreme, if m = n, n2 = 0, the EQC contains N = 2"^ molecules in 
completely mixed state, only a single query is needed. This is just the Liouville space computer fetching algorithm 
proposed recently [f4]. In Liouville space computation [12], no superposition of the computational basis states is used. 
Each molecule can also be viewed as a reversible classical computer that can be realized quantum mechanics [19], 
or simply be implemented directly using classical Turing machine with three tapes [20]. If ni = n — 1 and n2 = 1, 
the algorithm finds the marked item with just two queries. Clearly, the speedup is achieved at the expense of more 
molecules. The number of queries Ng and the number of molecules Ni satisfy A^^ x A'^i =constant. 

If we fix the number of molecules in an EQC, say at Ne- Then in order each constituent is occupied by at least one 
molecule, ni can not be larger than log2 Ne, otherwise there will be constituent without any occupying molecules. We 
assume that the qubit number n is very large, A''^; < 2". The maximum value for ni is log2 Ne- A natural estimate 
of the bound is to set Ne = Na, the Avogadro constant. This sets to ni < 79. In principle, we can vary m from to 
log2 Ne so that the functioning of the EQC changes. When ni = 0, all Ne molecules are in the same pure state, the 
EQC works as a single-quantum-computer. Most NMR EQC quantum computation experiments done so far manage 
to get this effect using the effective pure state technique. When m — 1, the ensemble is divided into two sub-ensembles 
each with Ne/2 molecules. Each sub-ensemble works as a single quantum computer. The whole ensemble works as 
two single-quantum-computers in parallel. When m = log2 A'"^;, the ensemble works as A''^ single-quantum-computers 
working in parallel. 

In the above discussion, a single molecule and an ensemble of many molecules in pure state are all treated as a 
single-quantum-computer. We point here that the EQC can do more by implementing the parallel operation proposed 
in Refs. [10,11]. In these work, the Grover algorithm is run on some k identical quantum computers in parallel. It 
is equivalent to repeating the algorithm in a single-quantum-computer k times. We call this parallel algorithm as 
repetition parallel algorithm(RPA). For instance, in Ref. [10], by running one iteration of Grover 's algorithm on k 
number of identical quantum computers simultaneously and then measuring these quantum computers simultaneously, 
the marked state can be found by picking out the one most quantum computers point to. Because the marked state 
will appear 9fc/A'' times in the outcome, whereas any other state appears k/N times. When k = 0{N log N), the 
probability that the marked state occurs more than any other state approaches unity. In Ref. [11], k identical quantum 
computers are searching in parallel. In each quantum computer, the probability for finding marked state is amplified. 
Because there are k quantum computers, by using the majority-vote rule, one needs less iterations on each quantum 
computer. The speedup scales as 0{-</k). The extent of speedup is the same as the PQC algorithm. But there are 
several differences between the PQC and the RPA: 1) in the PQC, the database for each quantum computer is reduced 
from N to N/Ni, whereas in repetition parallelism, the database size is always N; 2) in the PQC, some ni qubits 
are in mixed state, whereas in the RPA, all qubits are in pure states. This gives the PQC the advantage to make a 
fuller use of qubit resources as we will explain later; 3) the PQC algorithm is of full success rate whereas the RPA is 
probabilistic. To overcome fluctuation, it requires more resource than that in the PQC. For instance, for single query 
searching, the PQC algorithm requires N molecules whereas the algorithm in Ref. [10] requires 0{N log A^) molecules. 

In reality, some number, say A^,, of molecules has to be used as a logical molecule. A logical molecule can be viewed 
as the minimum number of molecules that acts as a single-quantum-computer. Then a molecule in the preceding 
discussion should be understood as a logical molecule. The number of logical molecules in an EQC is Ns/Ng. In 
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practice, an NMR EQC contains a large number of molecules, say lO'^^. Though with effective pure state technique, 
the number of molecules contributing to quantum computation is reduced, there are still 10^°. This is much more 
than that needed for a logical molecule. Thus in ensemble quantum computation with effective pure state technique, 
it is possible to sec the effect of repetition parallelism. Indeed, it has been pointed out that in ensemble quantum 
computation, unsorted database search can be faster than Grover algorith [21] by trading space resources with time 
resources, a reflection of the repetition parallelism. In implementing the PQC, effective pure state technique can also 
be used to prepare the n2 + m + 1 qubits in pure state. 

Shor algorithm can also be run in the PQC. The aim is to find the period r of Mod N^. We need two registers, 
one argument register with n qubits where < 2" < 2A^^, and one function register with similar size. We divide the 
argument register into ni qubits in the complete mixed state, and ri2 qubits in pure state. The ensemble is prepared in 
state described by (4). We perform Mod and store the results in the function register. After performing Fourier 
transform on only the n2-register, the states in the n2-register becomes identical in all constituents. By measuring 
the n2-register using an ancilla qubit, the period in the n2-register N2/r can be found. The speedup is achieved due 
to two factors. First, the Fourier transform is done on a smaller space in the n2-register, and it requires only 0(n|) 
steps as compared with O(n^) steps in standard Shor algorithm. Secondly, there are Ni constituents, therefore there 
are A'^i transitions by a single ensemble measurements. In standard Shor algorithm, several runs of the algorithm 
are required. In the PQC, this can be reduced by a factor of 1/A^i. We illustrate this in a simple example with 
iVb = 15, a = 7, n = 8, ni = 2, n2 = 6. Shor's algorithm in a single-quantum-computer yields the following state 
= (|0) + |64) + |128) + ...)(|1) + |7) + |4) + |13)), and the period in the argument register is q/r = 64 where q = 256. 
With the PQC, the resulting state is p = [(|1) + |7) + |4) + |13))]([00] + [01] + [10] + [11])|[(|0) + |16) + |32) + |48))]. 
Upon measurement, 4 transitions from the n2-register appear, and this is equivalent to running the algorithm with 
6 qubits 4 times. But for the PQC operation of Shor algorithm, there is a restriction on ni: it should not be large, 
otherwise the Fourier transformation in n2 qubits will not achieve the desired destructive interference. 

The PQC uses mixed state in general. One can take advantage of this to make a fuller use of the qubit resources 
in EQC. In the Cleve-DiVincenzo [17], and the Schulman-Varizani [18] algorithms, 0{n) qubits are prepared in pure 
state while some qubits {0{y/ri) in [17] and 0(77.) in [18]) have to be in the completely mixed state and be discarded. 
In the PQC, these qubits can be re- used. This gives a natural criteria for dividing n into ni and n2- Wc can use these 
discarded qubits as the ni-register. This increases considerably the number of qubits usable in an EQC. 
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